"They tested four well-known models, including GPT-3. The best was truthful on 58% of questions, while human performance was 94%. The models "generated many false answers that mimic popular misconceptions and have the potential to deceive humans". Interestingly, they also found that "the largest models were generally the least truthful"."